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Abstract 

In this study, two homology models (denoted as MP ro ST and MP ro SH) of main proteinase 
(MP ro ) from the novel coronavirus associated with severe acute respiratory syndrome 
(SARS-CoV) were constructed based on the crystal structures of MP ro from transmissible 
gastroenteritis coronavirus (TGEV) (MP ro T) and human coronavirus HcoV-229E (MP ro H), 
respectively. Both MP ro ST and MP ro SH exhibit similar folds as their respective template 
proteins. These homology models reveal three distinct functional domains as well as an 
intervening loop connecting domains II and III as found in both template proteins. A cat¬ 
alytic cleft containing the substrate binding sites SI and S2 between domains I and II are 
also observed. S2 undergoes more significant structural fluctuation than SI during the 400 
ps molecular dynamics simulations because it is located at the open mouth of the catalytic 
cleft, while SI is situated in the very bottom of this cleft. The thermal unfolding of these 
proteins begins at domain III, where the structure is least conserved among these proteins. 
MP ro may still maintain its proteolytic activity while it is partially unfolded. The electro¬ 
static interaction between Arg40 and Asp 186 plays an important role in maintaining the 
structural integrity of both SI and S2. 

Key words: Homology, Main Proteinase, Coronavirus, Severe acute respiratory syndrome 
(SARS), Substrate binding site, Molecular dynamics simulations. 

Introduction 

An outbreak of atypical pneumonia, designated as severe acute respiratory syn¬ 
drome (SARS), was first reported in Guangdong Province of China in late 2002 
and rapidly spread to several countries (1,2). Infection of SARS is usually char¬ 
acterized by high fever, malaise, rigor, headache, nonproductive cough and may 
progress to generalized, interstitial infiltrates in the lung (3). The sequence of the 
complete genome of the coronavirus associated with SARS (SARS-CoV) has been 
determined and characterized with two different isolates (4, 5). Phylogenetic 
analyses and sequence comparisons have further revealed that SARS-CoV is not 
closely related to any of the three groups of coronaviruses. 

Coronaviruses belong to a diverse group of positive-stranded RNA viruses featur¬ 
ing the largest viral RNA genomes. They share a similar genome organization and 
common transcriptional and translational processes as Arteriviridae (6, 7). The 


Abbreviations: 3CLP ro : 3C-like proteinase; 3D: Three-dimensional; ASA: Accessible surface area; 
CVFF: Consistent valence force field; DSSP: Dictionary of secondary structure pattern; MD: Molecular 
dynamics; MP ro : Main proteinase; MP ro H: Main proteinase of human coronavirus HcoV-229E; MP ro S: 
Main proteinase of SARS-CoV; MP ro SH: Homology model of MP ro S based on the crystal structure of 
MP ro H; MP ro ST: Homology model of MP ro S based on the crystal structure of MP ro T; MP ro T: Main pro¬ 
teinase of TGEV; PCB: Periodic boundary condition; RMSD: Root-mean-square deviation; SARS: 
Severe acute respiratory syndrome; SARS-CoV: Coronavirus associated with SARS; SCR: Structural 
conserved region; TGEV: Transmissible gastroenteritis coronavirus 
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human coronavirus HcoV-229E replicase gene encodes two overlapping polypro¬ 
teins, ppla and pplab (8), that mediate all the functions required for viral replica¬ 
tion and transcription (9). The functional polypeptides are released from the 
polyproteins by extensive proteolytic processing, which is primarily achieved by 
the 33.1-kDa HCoV-229E main proteinase (MP ro ) (10). MP ro is commonly also 
called 3C-like proteinase (3CLP ro ) to indicate a similarity of its cleavage site speci¬ 
ficity to that observed for picornavirus 3C proteinase (3CP ro ) and the identification 
of a Cys residue as the principle nucleophile in the context of a predicted two-13- 
barrel fold (11, 12). MP ro from HcoV-229E (MP ro H) has been biosynthesized in 
Escherichia coli and the enzyme properties, inhibitor profile, and substrate speci¬ 
ficity of the purified protein have been well characterized (10, 13). 

Recently, the crystal structures of MP ro H (14) and MP ro from porcine coronavirus 
(transmissible gastroenteritis virus, TGEV) (MP ro T) complexed with its inhibitor 
(15) have been determined. In addition, homology models of MP ro S based on the 
crystal structures of MP ro H (14) and MP ro T (16, 17) have been also constructed. 
Comparison of these structures reveals a remarkable degree of conservation of the 
substrate binding sites, which is further supported by the cleavage of the substrate 
for MP ro T with the recombinant MP ro S (14). In addition, MP ro S exhibits 40 and 
44% sequence identity to MP ro H and MP ro T, respectively (14). 

Molecular dynamics (MD) simulations in the atomic level have been intensively 
preformed to gain insight into protein unfolding from its native state induced by 
raising the temperature (18-20), changing the solvent (21) or increasing the pres¬ 
sure (22). Usually, temperatures in the range of 400 to 600K are employed. 
According to the Arrhenius equation, the unfolding rate is expected to be approxi¬ 
mately 10 3 -, 10 6 -, 10 9 -folds faster than it is observed experimentally when the tem¬ 
perature is increased by 100, 200, and 300 °C, respectively (23). Daggett and 
Levitt (24) have shown that the results obtained from the MD simulations of pro¬ 
tein unfolding induced by increasing the temperature should be reliable by com¬ 
paring their results to the pH induced denaturation of barnase (25). Previously, sev¬ 
eral MD simulations, homology modeling, and molecular docking experiments 
have been successfully conducted towards various target proteins in our group (26- 
35). In this paper, two homology models of MP ro from SARS-CoV (MP ro S) were 
constructed (denoted as MP ro ST and MP ro SH) based on the crystal structures of 
MP ro T (15) and MP ro H (14), respectively. Subsequently, MD simulations associat¬ 
ed with temperature jump technique were conducted to investigate the structure 
variations of these proteins. Beyond the continued characterization of MP ro from 
various coronaviruses, the amino acid sequence alignment, structural homology 
analyses, and MD simulations of MP ro S presented in this study shall provide par¬ 
ticularly attractive targets for further structure-based design of anti-SARS drugs. 

Methods 

Model Proteins 

Two homology models of MP ro S (MP ro ST and MP ro SH) were constructed based 
on the monomer of the three-dimensional (3D) structure of MP ro T refined to 1.96 
A resolution (15) (Fig. 1A) and MP ro H solved at 2.54 A resolution (14) (Fig. IB), 
which were obtained from the protein data bank (PDB; accession numbers llvo 
and lp9u, respectively). The inhibitor, a substrate analog hexapeptidyl 
chloromethyl ketone, was removed from the crystal structure of MP ro T before 
being used as the template. Unfavorable nonphysical contacts in these structures 
were then eliminated using Biopolymer module of Insight II program (Accelyrs, 
San Diego, CA, USA) with the force field Discover CVFF (consistent valence 
force field) (36-38) in the SGI 0200 workstation with 64-bit HIPS RISC R12000 
2 x 270 MHz CPU and PMC-Sierra RM7000A 350MHz processor (Silicon 
Graphics, Inc., Mountain View, CA, USA), followed by 10,000 energy mini- 
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Figure 1: The crystal structure of (A) MP ro T (15) and 
(B) MP ro H (14) and the homology model of (C) MP ro ST 
and (D) MP ro SH. These structures are visualized by 
Insight II program. The N- and C-termini are indicated. 
Secondary structure elements are labeled as in Table I. 
a-Helices are shown in red cylinders, while (3-strands are 
illustrated in arrows pointing from N- to C-terminus. The 
polypeptide backbones belonging to the turn and random 
coil regions are shown in blue and green, respectively. 
The general acid-base catalyst His residue and the nucle¬ 
ophilic Cys residue are labeled. The locations of the 
putative substrate binding sites SI and S2 are indicated. 


Figure 2: Amino acid sequence alignment of MP ro T, 
MP ro H, and MP ro S. Secondary structures as defined in 
the crystallographic structure of MP ro T (15) are shown 
on top. The start and end amino acid residues are num¬ 
bered in the brackets on the left and right of each 
sequence, respectively. Residues totally conserved in 
all sequences are indicated in red letters with green 
background. Residues conserved in MP ro T and MP ro H 
but different from those in MP ro S are represented in 
black letters with yellow background. Residues where 
variations occur are given in blue or brown letters with 
grey background. The amino acid residues missing in 
both MP ro T and MP ro H are shown as dashed lines. 
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Table I 

The amino acid sequence identities among M pro H, M pro T, and M pro S. 


Identity (%) 


Total 

Domain I 

Domain II 

Domain III 

M pro H and M pro T 

60.80 

63.44 

65.06 

55.45 

M pro H and M pro S 

40.19 

41.94 

45.78 

35.64 

M pro T and M pro S 

43.85 

44.09 

49.40 

39.22 


mization calculations using steepest descent method, to yield the model proteins 
for further structure building. 

Structural Homology 

Homology utilizes structure and sequence similarities for predicting unknown pro¬ 
tein structures. The Homology module in Insight II allows us to build the 3D mod¬ 
els of the target protein (i.e., MP ro S) using both its amino acid sequence and the 
structures of known, related template proteins (i.e., MP ro H and MP ro T). The 
Homology program provides simultaneous optimization of both structure and 
sequence homologies for multiple proteins in a 3D graphics environment, based on 
a method developed by Greer (39). Amino acid sequences of MP ro H (Accession 
Q05002) (40), MP ro T (NC_002306.2) (41), and MP ro S (NC_004718.3) (4) were 
obtained either from Swiss-Prot or NCBI database. Smith-Waterman pairwise 
amino acid sequence alignments were performed based on the conserved structur¬ 
al features among MP ro from various coronaviruses to find the location of the active 
site and substrate binding sites SI and S2 of MP ro S. The consensus structural con¬ 
served regions (SCRs) of MP ro S were generated from alignments of the target pro¬ 
tein to the template proteins. The atomic coordinates were then transferred from 
the template proteins to MP ro S in each SCR using Mutation Matrix module of the 
Insight II program. Automatic loop building was performed either by database 
searching (42) or generation through random conformational search (43). The 
coordinates at the N- and C-termini of these loops were then automatically 
assigned. Side chains of MP ro S were automatically replaced, preserving the con¬ 
formations of the template proteins. The side chain conformations were optimized 
either manually or automatically using a rotamer library (44). Secondary structure 
motifs were identified by database searching and defined by DSSP (45). The bond 
lengths and torsion angles in the SCRs and loop regions were repaired and relaxed 
using Homology/Refine/SpliceRepair and Homology/Refine/Relax, respectively. 
The newly built structures of MP ro S were substantially refined to avoid van der 
Waals radius overlapping, unfavorable atomic distances, and undesirable torsion 
angles using molecular mechanics and dynamics features in Discover module. 

Molecular Dynamics Simulations 

The crystal structures of MP ro H and MP ro T and the homology models of MP ro SH 
and MP ro ST were subjected to energy minimization calculations by steepest 
descent method with 3,000 iterations followed by Newton-Raphson method with 
5,000 iterations to be used as the initial energy-minimized structures for further 
structural comparison. Each energy-minimized structure was subsequently placed 
in the center of a lattice with the size of 50 x 60 x 85 A 3 full of 6,222,5,866,5,836, 
and 5,776 water molecules for the system of MP ro H, MP ro T, MP ro SH, and MP ro ST, 
respectively. These systems composed of the protein and water molecules were 
then equilibrated by performing 20,000 steepest descent minimization and 10 ps 
dynamics calculations. The explicit image periodic boundary condition (PBC) was 
used for solvent equilibrium. At the end of explicit image equilibrium, Discover 
will re-image molecule whose center of mass has moved out of the lattice in order 
to maintain the integrity of the lattice with a relatively constant density. Finally, 
400 ps MD simulation was carried out for each system using the Discover module 
of Insight II. The temperature and pressure were maintained for each MD simula¬ 
tion by weak coupling the system to a heat bath at 300, 400, and 600 K and an 



•2014 


external pressure bath at one atmosphere with a coupling constant of 0.5 ps, accord- 

o 

ing to the method described by Berendsen et al. (46). Cut-off radius of 13 A for 
the non-bonded interactions was applied to each MD simulation. The time-step of 
the MD simulations was 1 fs. The trajectories and coordinates of these structures 
were recorded every 2 ps for further analyses. 
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Structural Analyses 


Although some complicated algorithms have been proposed to measure the struc¬ 
tural similarity between proteins (47, 48), the root-mean-square deviation (RMSD) 
remains the simplest one for closely related proteins (49). For each MD simula¬ 
tion, the RMSDs of the trajectories recorded every 2 ps interval were calculated for 
the backbone C a atom of the entire protein, the substrate binding sites SI and S2, 
and domains I, II, and III during the course of 400 ps MD simulations with refer¬ 
ence to the respective starting structure according to Koehi (50). The RMSDs were 
calculated after optimal superimposition of the coordinates to remove translational 
and rotational motion (51). Secondary structures were assigned based on DSSP 
(45), in which pattern recognition of hydrogen bond was correlated to the geomet- 


Figure 3: The RMSDs of the backbone C a for (A) the 
entire protein, (B) substrate binding site SI, (C) sub¬ 
strate binding site SI, (D) domain I, (E) domain II, and 
(F) domain III of MP ro T, MP ro H, MP ro ST, and MP ro SH 
with reference to their respective starting structure dur¬ 
ing the 400 ps MD simulations at 300, 400, and 600 K. 
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Figure 4: Secondary structures predicted according to DSSP (45) as a function of MD simulation time 
for (A) MP ro T, (B) MP ro H, (C) MP ro ST, and (D) MP ro SH. a-Helix, (3-sheet, turn, and coil are shown in 
red, light yellow, blue, and green, respectively. 
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rical features. The default hydrogen bonding energy criterion of -0.5 kcal/mol was 
used. Accessible surface areas (ASAs) of the substrate binding sites SI and S2 and 
the distances between the sulfur atom of the nucleophilic Cys residue and the N s2 
of the general acid-base catalyst His residue and between the C 8 atom of the total¬ 
ly conserved Arg40 from S2 and the O' atom of the totally conserved Asp 186 from 
the extended loop connecting domains II and III (numbered as in MP ro T) for each 
structure were also recorded as a function of MD simulation time. The average sec¬ 
ondary structure content was defined as the ratio of the number of the residual H 
bonds at time t to the number of the total H bonds in the starting structure. 

Results and Discussion 

The Homology Models of MP ro ST and MP ro SH 

Usually, an optimal amino acid sequence alignment based on the conserved struc¬ 
tural regions is essential to the success of homology modeling. The results of pair¬ 
wise amino acid sequence alignment of MP ro T, MP ro H, and MP ro S are given in 
Figure 2. There are 301,300, and 306 residues in MP ro T, MP ro H, and MP ro S, respec¬ 
tively. The residue corresponding to Ala46 in domain I of MP ro S and those corre¬ 
sponding to Asp248, Ile249, and Gln273 in domain III of MP ro S are missing in both 
MP ro T and MP ro H. In addition, there are one and two extra residues at the C-termi- 
nus of MP ro S comparing to MP ro T and MP ro H, respectively. Both the general acid- 
base catalyst and the nucleophile residue of these three proteins are totally con¬ 
served, with the general acid-base catalyst His41 located in a highly conserved sig¬ 
nature sequence (LNGLWLXDXVXCPRHVI) of domain I and the nucleophilic 
Cys 144 for MP ro T and MP ro H or Cys 145 for MP ro S in the highly conserved signa¬ 
ture sequence (TIXGSFXXGXCGSXG) of domain II (i.e., Xs indicate the noncon- 
served residues). The results of amino acid sequence identity among these three pro¬ 
teins are summarized in Table I. MP ro T and MP ro H show the highest amino acid 
identity (60.80 %), whereas MP ro H and MP ro S exhibit the lowest amino acid identi¬ 
ty (40.19 %). MP ro S shows slightly higher amino acid identity to MP ro T than MP ro H, 
indicating that the structure of MP ro S may be more similar to MP ro T than MP ro H. 
Comparing the three domains among these three proteins, domain II has the highest 
amino acid identity, whereas domain III shows the lowest amino acid identity. 

The level of similarity between MP ro S and MP ro T as well as between MP ro S and 
MP ro H allowed us to construct two homology models for MP ro S (denoted as 
MP ro ST and MP ro SH) by comparative approach and the results are illustrated in 
Figure 1C and D. The quality of the geometry and of the stereochemistry of the 
protein structures was validated using Homology/ProStat/Struct_Check commend 
of Insight II program. A total of 97 and 96% of the backbone dihedral angle (cp and 
())) densities are located within the structurally favorable regions in Ramachandran 
plot for MP ro ST and MP ro SH, respectively (data not shown). The calculation of 
main chain torsion angles (xi and % 2 ) of these proteins showed no severe distorsion 
of the backbone geometry. In addition, all bond lengths and angles for both homol¬ 
ogy models are located within the reasonable regions. Besides, the homology mod¬ 
els of MP ro ST and MP ro SH constructed in this work are very similar to the 3D mod¬ 
els proposed by Lee et al. (16) and Aland et al. (14), respectively. The above evi¬ 
dences indicate that the quality of these homology models should be reliable. 

The results of homology modeling show that both MP ro ST and MP ro SH exhibit three 
distinct domains, indicating that they adopt similar folds as MP ro T and MP ro H, 
respectively. However, the secondary structures of both MP ro ST and MP ro SH pre¬ 
dicted according to DSSP (45) are less conserved comparing to those of MP ro T (Fig. 
1A) and MP ro H (Fig. IB), particularly in domain III. It is consistent with the results 
of amino acid sequence alignment, showing that domain III exhibits the least 
sequence identity comparing to domains I and II among these proteins. Instead of 
separating domains I and II with a catalytic cleft, domains II and III are loosely con- 
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Figure 5: The AS As of the substrate binding sites SI 
and S2 at (A) 300, (B) 400, and (C) 600 K as a function 
of MD simulation time for MP ro T, MP ro H, MP ro ST, and 
MP ro SH. 


nected by a long loop (residues 184-199 in both MP ro T and MP ro H and residues 185- 
200 in MP ro S) in all structures. Although showing the least structural identity, 
domain III, a globular cluster of 5, 5, 4, and 2 helices for MP ro T, MP ro H, MP ro ST, 
and MP ro SH, respectively (Fig. 1), has been implicated in the proteolytic activity of 
MP ro (13). Comparing the two crystal structures, MP ro T and MP ro H, and the two 
homology models, MP ro ST and MP ro SH, we found that domain I of MP ro S is more 
similar to that of MP ro H, while domains II and III of MP ro S are more similar to those 
of MP ro T. The low sequence identity and secondary structure similarity in domain 
III among these proteins presented in the present study, as well as the previous find¬ 
ings showing that the characterization of recombinant proteins, in which 33,28, and 
34 C-terminal amino acid residues of MP ro from IB V, MHV, and HCoV, respective¬ 
ly, were deleted resulted in dramatic losses of proteolytic activity, suggest that 
domain III may play a minor role in proteolytic activity through an undefined mech¬ 
anism (13). The putative substrate binding sites SI and S2 of MP ro ST and MP ro SH 
are also located in a catalytic cleft between domains I and II (Fig. 1C and D), which 
are nearly identical to those of MP ro T and MP ro H (Fig. 1A and B). It indicates that 
MP ro S may follow the similar substrate binding mechanisms of MP ro T and MP ro H, 
allowing us to design anti-SARS drugs by screening the known proteinase 
inhibitors. A good example has been given by Anand et al. (14). They proposed a 
3D structural model of MP ro S based on the crystal structure of MP ro H and further 
recommended the use a rhino virus inhibitor (codename AG7088), which is already 
in clinical trials as anti-common cold drug, as the potential model compound for the 
design of anti-SARS drugs. In addition, Lee et al. (16) have docked 16 available 
antiviral drugs from the NCI database to the structural model of MP ro S and detect¬ 
ed that four of them with trade-names Nevirapine, Glycovir, Virazole, and 
Calanolide A fit well at the substrate binding cleft of there 3D model of MP ro S. 

Molecular Dynamics Simulations 

The structural changes of the whole protein, substrate binding sites SI and S2, and 
domains I, II and III for MP ro T, MP ro H, MP ro ST, and MP ro SH were evaluated by 
plotting the main-chain C a RMSDs at different temperatures as a function of run¬ 
ning time and the results are shown in Figure 3A-F, respectively. At 300 K, the 

o 

overall RMSDs for these proteins all converged below 3 A, which is in good agree¬ 
ment with the results from previous MD simulations (16). In addition, the increas¬ 
es of the overall RMSDs for these proteins at 400 and 600 K followed the similar 

o 

pattern, except for MP ro H, whose overall RMSD reached 9 A at 600 K; whereas 

o 

those of the other three proteins reached 6 A only. It indicates that MP ro H may 
undergo an overall structural change more dramatically at high temperature. By 
comparing the RMSDs of the substrate binding sites SI and S2 at various temper¬ 
atures (Fig. 3B and C), we found that SI exhibits higher structural integrity than 
S2. It is attributed to that S2 is located on the open mouth of the catalytic cleft 
between domains I and II and is fully solvent-exposure, whereas SI is situated in 
the very bottom of this cleft and is well protected from the hydrophobic core. The 
higher structural variation of S2 makes it flexible enough to accommodate a bulky 
hydrophobic residue from the substrate. Furthermore, S2 of MP ro H undergoes a 
more dramatic structural change at higher temperatures than S2 of the other pro¬ 
teins, indicating that MP ro H may lose its binding affinity towards various substrates 
or inhibitors more easily than the other three MP ro . 

Comparing the RMSD values in Figure 3D-F, we found that domains I and II of 
MP ro T, MP ro H, MP ro ST, and MP ro SH follow the similar dynamics behaviors; where¬ 
as domain III of these proteins shows different structural variations during the entire 
simulation time courses. This result is in good agreement with results of amino acid 
sequence alignment and homology modeling, showing that domain III of these pro¬ 
teins exhibit least structural similarity among these three domains. The secondary 
structure propensity of these proteins was predicted according to DSSP (45) during 
the entire MD courses at various temperatures and the results are shown in Figure 4. 
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The values of the average secondary structure content for each secondary structure 
element in these proteins are summarized in Table II. As expected, it is faster for 
domain III to lose its helical content than for domains I and II to lose their sheet con¬ 
tent in all cases. The high dielectric constant of the explicit water system may 
increase the opportunity of hydrogen bonding between amide protons and sur¬ 
rounding solvent molecules and simultaneously promotes the intermolecular hydro¬ 
gen bonding and therefore destabilizes the structural integrity of these helices in 
domain III. From the analyses of the average secondary structure contents (Table 
II) and the secondary structure propensities during the MD time courses (Fig. 4), we 
estimated that the thermal unfolding of the helices in domain III of both MP ro T and 
MP ro H follows the order of CIII-^EIII-^BIII-^DIII^AIII. Helix AIII is mainly 
composed of nonpolar residues and forms an interior hydrophobic core in domain 
III, which is in turn restricted to solvent exposure and thus maintains higher helical 
content than the other helices. The ASA for each residue in helix AIII is nearly zero 
(data not shown), again indicating that the hydrophobic environment around helix 
AIII may protect it from forming intermolecular hydrogen bonding with water mol¬ 
ecules. Furthermore, the result of amino acid sequence alignment shows that helix 
AIII exhibits higher sequence identity than the other helices in domain III among 
these proteins, which may also emphasize the importance of helix AIII in maintain¬ 
ing the structural integrity of the globular domain III in MP ro . 
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Table II 

Average secondary structure content for each secondary structure element in M pro T, M pro H, M pro ST, and M pro SH. 


Secondary 

Average secondary structure content (%) 

structure 

M pro T 

M pro H 

M pro ST 

M pro SH 

element 

300 K 

400 K 

600 K 

300 K 

400 K 

600 K 

300 K 

400 K 

600 K 

300 K 

400 K 

600 K 

al 

75 

65 

10 

55 

43 

7 

60 

73 

43 

99 

90 

20 

bl 

72 

64 

14 

64 

18 

9 

93 

85 

62 

95 

84 

22 

cl 

55 

67 

26 

85 

48 

18 

65 

62 

53 

90 

75 

27 

AI 

3 

16 

2 

3 

3 

0 

- 

- 

- 

90 

85 

12 

dl 

59 

61 

8 

40 

53 

8 

- 

- 

- 

68 

46 

3 

el 

53 

45 

5 

50 

41 

5 

- 

- 

- 

91 

74 

23 

fl 

77 

76 

20 

83 

72 

12 

50 

47 

25 

55 

65 

8 

all 

60 

50 

15 

61 

58 

17 

37 

21 

1 

56 

46 

5 

bll 

45 

36 

10 

54 

53 

10 

36 

19 

2 

86 

55 

1 

ell 

45 

39 

22 

44 

39 

7 

33 

15 

14 

87 

76 

5 

dll 

42 

46 

19 

60 

45 

10 

88 

76 

52 

82 

60 

5 

ell 

49 

37 

18 

65 

34 

3 

86 

76 

53 

81 

44 

8 

ffl 

22 

20 

3 

22 

5 

1 

52 

48 

19 

49 

8 

5 

AIII 

85 

56 

13 

69 

63 

14 

56 

44 

34 

61 

42 

8 

Bill 

78 

45 

5 

71 

39 

6 

- 

- 

- 

- 

- 

- 

CIII 

37 

13 

1 

1 

0 

0 

78 

36 

6 

- 

- 

- 

Dill 

93 

63 

9 

96 

72 

9 

- 

- 

- 

- 

- 

- 

EIII 

92 

76 

3 

35 

53 

3 

90 

59 

22 

66 

58 

4 


In contrast to the specific unfolding order of the helices in domain III, there is no 
particular unfolding order of the sheets in domains I and II (Fig. 4 and Table II). The 
packing of the sheets in domains I and II is similar to a sandwich and the catalytic 
cleft is located in the middle of this well organized structure. The nucleophilic 
Cysl44 is located in the center of this catalytic cleft and some of the residues form¬ 
ing the substrate binding site S1 is distributed in some of the sheets in domains I and 
II. Thus, in order to maintain the proteolytic activity, these sheets have to preserve 
their secondary structural integrity. Most of the structural variations in domains I 
and II at high temperatures are resulted from the fluctuation of outer loops, which 
are fully exposed to the solvent. Previous study has shown that the region around 
residues 10-20 (corresponding to sheet bl in domain I) is relatively rigid and the 
region around residues 265-287 (corresponding to the loop connecting helices Dill 
and EIII in domain III) is relatively flexible than the other regions of MP ro ST (16). 
The present results also indicate that the structural network formed by the sheets in 
domains I and II is relatively stable during the MD simulation courses comparing to 
the network formed by the helices in domain III. A short helix AI is observed in the 
outer surfaces of domain I in the crystal structures of MP ro T and MP ro H (Fig. 1A and 
B), whereas this helix is missing in the homology models of MP ro ST and MP ro SH 
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Figure 7: The distance between the sulfur atom of the 
nucleophilic Cys residue and the N 82 of the general 
acid-base catalyst His residue as a function of MD sim¬ 
ulation time for MP ro T, MP ro H, MP ro ST, and MP ro SH at 
300, 400, and 600 K. The snapshots of MP ro T taken at 
82 ps, 400 K and at 100 ps, 600 K are shown in Figure 
8A and B, respectively. 
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Figure 6: The distance between the C 8 atom 
of the totally conserved Arg40 from the sub¬ 
strate binding site S2 and the O atom of the 
totally conserved Asp 186 from the interven¬ 
ing loop connecting domains II and III 
(residue numbered as in MP ro T) as a function 
of MD simulation time for MP ro T, MP ro H, 
MP ro ST, and MP ro SH at 300, 400, and 600 K. 


(Fig. 1C and D). During the MD simulation courses, this helix disappeared very 
quickly due to the contact with the surrounding water molecules. 

It has been shown previously that, similarly to 3CP ro (52-54), specific substrate 
binding by MP ro is ensured by the well-defined SI and S2 substrate binding pockets 
(15). In addition, it has also been shown that the imidazole side chain of the con¬ 
served His residue, which is located in the center of a hydrophobic pocket, interacts 
with the PI carboxamide side chain of the substrate. This specific interaction is gen¬ 
erally considered to determine the piconavirus 3CP ro specificity for Gin residue at 
PI (52-54). The totally conserved His 162 of both MP ro T and MP ro H or His 163 of 
MP ro S is located at the very bottom of this hydrophobic pocket, which is formed by 
the totally conserved residues Phel39 of both MP ro T and MP ro H or Phel40 of MP ro S 
and the main-chain atoms of lie 140, Leu 164, Glul65, and His 171 of MP ro T, lie 140, 
He 164, Glul65, Hisl71 of MP ro H, or Leul41, Metl65, Glul66, and Hisl72 of 
MP ro S. The totally conserved Glul65 of MP ro T and MP ro H or Glul66 of MP ro S 
forms an ion pair with the totally conserved His 171 of MP ro T and MP ro H or His 172 
of MP ro S (15). This salt bridge is itself on the periphery of these molecules, form¬ 
ing part of the outer wall of the substrate binding site S1. Figure 5 shows the AS As 
of the substrate binding sites SI and S2 of MP ro T, MP ro H, MP ro ST, and MP ro SH at 
various temperatures. In general, S2 exhibits higher AS As than SI during the MD 
simulation courses. In addition, SI was found to maintain its structural integrity, 
whereas S2 exhibits more structural variations during the MD simulations. It is 
attributed to that S2 is located on the open mouth of the catalytic cleft between 
domains I and II and thus is fully exposed to the surrounding solvent, whereas SI is 
situated in the very bottom of this cleft and is subsequently protected by the 
hydrophobic core. The higher structural variation of S2 is necessary for the prote¬ 
olytic activity of MP ro because it is flexible enough to accommodate a bulky 
hydrophobic residue from the substrate and further allows the substrate to form 
close contact with the substrate binding cleft formed by SI and S2 of this enzyme. 


Previous study has indicated that the loop connecting domains II and III (residues 
184-199) plays an important role in maintaining the proteolytic activity of MP ro T 
(15). This intervening loop is located in adjacent to the substrate binding site S2. 
Moreover, the totally conserved Arg40 from S2 forms an electrostatic interaction 
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Figure 8: The snapshots of MP ro T taken at (A) 82 ps, 
400 K and (B) 100 ps, 600 K. Secondary structures pre¬ 
dicted according to DSSP (43) are shown as in Figure 1. 
Substrate binding sites SI and S2 are represented as 
CPK and colored in indigo and brown, respectively. 
The nucleophilic Cysl44 and the general acid-base cat¬ 
alyst His41 are shown in purple as sticks. The totally 
conserved residues Arg40 and Asp 186 forming the elec¬ 
trostatic interaction in the native structure of MP ro T are 
shown in grey as stick. These structures are visualized 
by Insight II program. 


with the totally conserved Asp 186 from this extended loop (15). In order to inves¬ 
tigate the importance of this electrostatic interaction in maintaining the structural 
integrity of S2 during the MD simulations, the distance between the C 8 atom of 
Arg40 and the C7 atom of Asp 186 (residues numbered as in MP ro T) for each protein 
was measured and the results are shown in Figure 6. In addition, the distance 
between the S atom of the nucleophilic Cys residue and the N 82 of the general acid- 
base catalyst His residue for each structure was also recorded as a function of MD 
simulation time and the results are given in Figure 7. At higher temperatures, these 
distances all increase significantly, indicating that the electrostatic interaction 
formed by Arg40 and Asp 186 and the catalytic activity formed by the Cys-His pair 
are destroyed towards heating. Interestingly, the distance between Cys-His pair 
increases dramatically for MP ro T between 75 and 90 ps at 400 K. In order to com¬ 
pare the partially unfolded structure with the totally unfolded one, the snapshots of 
MP ro T at 82 ps, 400 K (the distances between Arg40 and Asp 186 and between His41 
and Cys 144 are 7.79 and 9.04 A, respectively) and at 100 ps, 600 K (the distances 
between Arg40 and Asp 186 and between His41 and Cys 144 are 11.18 and 16.08 A, 
respectively) were generated as in Figure 8A and B, respectively. According to these 
snapshots, the former structure still maintains most of its secondary structural 
integrity, whereas most of the secondary structures disappear in the later one. In 
addition, the substrate binding sites SI and S2 still maintain their structural integri¬ 
ty in the partially unfolded MP ro T, whereas these two binding sites are shifted and 
destroyed in the totally unfolded MP ro T. It indicates that the electrostatic interaction 
between Arg40 and Asp 186 plays an important role in maintaining the packing of 
SI and S2, thus preserving the proteolytic activity of this enzyme (15). From the 
above results, we may conclude that MP ro T may still maintain its proteolytic activi¬ 
ty while it is partially unfolded and that the electrostatic interaction between Arg40 
and Asp 186 functions as a gate controlling the open and close states of the substrate 
binding sites S2. Previous MD simulations have shown that MP ro ST complexed 
with inhibitor is, in average, less flexible than the free enzyme either in the 
monomeric or dimeric form (16). Our simulation results also indicates that water 
molecules may enter S2 and further penetrate SI without the protection from the 
bound inhibitor when the electrostatic interaction between Arg40 and Asp 186 is 
destroyed at high temperatures, resulting in the distortion and destroy of the pack¬ 
ing of these two sites, which are mainly lined up by hydrophobic residues. 
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In conclusion, the homology models of MP ro S were successfully constructed based 
on the crystal structures of both MP ro T (15) and MP ro H (14) by comparative 
approach. Both MP ro ST and MP ro SH exhibit similar folds as their respective tem¬ 
plates MP ro T and MP ro H. Three distinct functional domains as well as an inter¬ 
vening loop from residues 184 to 199 connecting domains II and III are also 
obtained in these homology models as in the template proteins. A catalytic cleft 
containing the substrate binding sites S1 and S2 between domains I and II are also 
observed in these homology models. S2 undergoes more dramatic structural 
changes than S1 because it is located at the open mouth of the catalytic cleft and is 
fully exposed to the solvent, whereas SI is situated in the very bottom of this cleft 
and is well protected from the hydrophobic core. The unfolding of these proteins 
begins at domain III, where the structure is least conserved among these proteins. 
MP ro may still maintain its proteolytic activity while it is partially unfolded. The 
electrostatic interaction between the totally conserved Arg40 from the substrate 
binding site S2 and the totally conserved Asp 186 from the intervening loop 
between domains II and III (residues numbered as in MP ro T) plays an important 
role in maintaining the structural integrity of both SI and S2. 
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